A Self-Supervised Representation Learning of Sentence Structure for Authorship Attribution

نویسندگان

چکیده

The syntactic structure of sentences in a document substantially informs about its authorial writing style. Sentence representation learning has been widely explored recent years and it shown that improves the generalization different downstream tasks across many domains. Even though utilizing probing methods several studies suggests these learned contextual representations implicitly encode some amount syntax, explicit information further performance deep neural models domain authorship attribution. These observations have motivated us to investigate sentences. In this article, we propose self-supervised framework for structural network contains two components; lexical sub-network which take sequence words their corresponding labels as input, respectively. Due n -to-1 mapping labels, each word will be embedded into vector mainly carries information. We evaluate using tasks, subsequently utilize them attribution task. Our experimental results indicate embeddings significantly improve classification when concatenated with existing pre-trained embeddings.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Syntactic Stylometry: Using Sentence Structure for Authorship Attribution

Most approaches to statistical stylometry have concentrated on lexical features, such as relative word frequencies or type-token ratios. Syntactic features have been largely ignored. This work attempts to fill that void by introducing a technique for authorship attribution based on dependency grammar. Syntactic features are extracted from texts using a common dependency parser, and those featur...

متن کامل

On the Use of Supervised Learning Method for Authorship Attribution

In this paper we investigate the use of a supervised learning method for the authorship attribution that is for the identification of the author of a text. We suggest a new, simple and efficient method, which is merely based on counting the number of repetitions of each alphabetic letter in the text, instead of using the traditional classification properties; such as the contents of the text an...

متن کامل

Deep Sentence-Level Authorship Attribution

We examine the problem of authorship attribution in collaborative documents. We seek to develop new deep learning models tailored to this task. We have curated a novel dataset by parsing Wikipedia’s edit history, which we use to demonstrate the feasiblity of deep models to multi-author attribution at the sentence-level. Though we attempt to formulate models which learn stylometric features base...

متن کامل

A Supervised Authorship Attribution Framework for Bengali Language

Authorship Attribution is a long-standing problem in Natural Language Processing. Several statistical and computational methods have been used to find a solution to this problem. In this paper, we have proposed methods to deal with the authorship attribution problem in Bengali. More specifically, we proposed a supervised framework consisting of lexical and shallow features, and investigated the...

متن کامل

A New Document Author Representation for Authorship Attribution

This paper proposes a novel representation for Authorship Attribution (AA), based on Concise Semantic Analysis (CSA), which has been successfully used in Text Categorization (TC). Our approach for AA, called Document Author Representation (DAR), builds document vectors in a space of authors, calculating the relationship between textual features and authors. In order to evaluate our approach, we...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: ACM Transactions on Knowledge Discovery From Data

سال: 2022

ISSN: ['1556-472X', '1556-4681']

DOI: https://doi.org/10.1145/3491203